Document image retrieval with improvements in database quality

نویسنده

  • Hannu Kauniskangas
چکیده

Modern technology has made it possible to produce, process, transmit and store digital images efficiently. Consequently, the amount of visual information is increasing at an accelerating rate in many diverse application areas. To fully exploit this new content-based image retrieval techniques are required. Document image retrieval systems can be utilized in many organizations which are using document image databases extensively. This thesis presents document image retrieval techniques and new approaches to improve database content. The goal of the thesis is to develop a functional retrieval system and to demonstrate that better retrieval results can be achieved with the proposed database generation methods. Retrieval system architecture, a document data model, and tools for querying document image databases are introduced. The retrieval framework presented allows users to interactively define, construct and combine queries using document or image properties: physical (structural), semantic, textual and visual image content. A technique for combining primitive features like color, shape and texture into composite features is presented. A novel search base reduction technique which uses structural and content properties of documents is proposed for speeding up the query process. A new model for database generation within the image retrieval system is presented. An approach for automated document image defect detection and management is presented to build high quality and retrievable database objects. In image database population, image feature profiles and their attributes are manipulated automatically to better match with query requirements determined by the available query methods, the application environment and the user. Experiments were performed with multiple image databases containing over one thousand images. They comprised a range of document and scene images from different categories, properties and condition. The results show that better recall and accuracy for retrieval is achieved with the proposed optimization techniques. The search base reduction technique results in a considerable speed-up in overall query processing. The constructed document image retrieval system performs well in different retrieval scenarios and provides a consistent basis for algorithm development. The proposed modular system structure and interfaces facilitate its usage in a wide variety of document image retrieval applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Retrieval Based on Keyword Spotting Using Relevance Feedback

Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Camera Based Document Image Retrieval with More Time and Memory Efficient LLAH

In this paper, we propose improvements of our camerabased document image retrieval method with Locally Likely Arrangement Hashing (LLAH). While LLAH has high accuracy, efficiency and robustness, it requires a large amount of memory. It is also required to speed up the retrieval of LLAH for applications to real-time document image retrieval. For these purposes, we introduce the following two imp...

متن کامل

Image Retrieval Using Dynamic Weighting of Compressed High Level Features Framework with LER Matrix

In this article, a fabulous method for database retrieval is proposed.  The multi-resolution modified wavelet transform for each of image is computed and the standard deviation and average are utilized as the textural features. Then, the proposed modified bit-based color histogram and edge detectors were utilized to define the high level features. A feedback-based dynamic weighting of shap...

متن کامل

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999